Skip to content

Conversation

@scttcper
Copy link
Member

Replace N+1 get_or_create queries with batch operations:

  • Batch fetch existing authors with single filter(email__in=...)
  • Batch create missing authors with bulk_create(ignore_conflicts=True)
  • Batch update changed names with bulk_update

Reduces query count from 2N+ to at most 4 queries regardless of author count, improving performance for releases with many unique commit authors.

n+1 issue
https://sentry.sentry.io/issues/6234355863/

scttcper and others added 2 commits January 14, 2026 13:01
Replace N+1 get_or_create queries with batch operations:
- Batch fetch existing authors with single filter(email__in=...)
- Batch create missing authors with bulk_create(ignore_conflicts=True)
- Batch update changed names with bulk_update

Reduces query count from 2N+ to at most 4 queries regardless of
author count, significantly improving performance for releases
with many unique commit authors.

Co-Authored-By: Claude <[email protected]>
@scttcper scttcper requested review from a team and wedamija January 14, 2026 21:22
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jan 14, 2026
Copy link
Member

@shashjar shashjar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of minor comments

data["_normalized_email"] = author_email

if author_email and author_email not in author_data_by_email:
author_data_by_email[author_email] = {"name": data.get("author_name")}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the above section be compressed into the below?

author_email = truncatechars(author_email, 75)
if author_email:
    # Lowercase to match CommitAuthorManager.get_or_create behavior
    author_email = author_email.lower()

    # Store normalized email back in data for later lookup
    data["_normalized_email"] = author_email

    if author_email not in author_data_by_email:
        author_data_by_email[author_email] = {"name": data.get("author_name")}

)


def create_commit_authors(commit_list, release):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you but worth adding a test to verify the batching behavior or anything else related to commit authors?

- Simplify code by moving _normalized_email assignment inside if block
- Add test_multiple_authors to verify batch author creation, reuse,
  and name updates

Co-Authored-By: Claude <[email protected]>
@scttcper scttcper enabled auto-merge (squash) January 15, 2026 22:18
@scttcper scttcper merged commit 656d194 into master Jan 15, 2026
65 checks passed
@scttcper scttcper deleted the scttcper/perf-batch-commit-authors branch January 15, 2026 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants